Texas is a critical state in American politics. It has long been a conservative bulwark in the electoral college and reliable source of Republican members of Congress, but its rapidly changing population has sparked hope among Democrats that they someday be able to "turn Texas blue." Texas is one of a number of "Sun Belt" states (also including Arizona, Georgia, Florida, and North Carolina) that have been identified as historically Republican but on the verge of tilting Democratic.
This portfolio explores recent congressional elections in Texas for a variety of perspectives and examines whether or not Democrats are justified in aggressively pursuing the state in the 2020 0elections. There is a wealth of data from disparate sources available to examine trends from several angles, including FEC financial data to see how candidates of both parties are funded (and what individuals are contributing to them), demographic and Census data to examine the socioeconomic composition of the electorate, and election data at various levels of geographical aggregation.
I'll begin by installing and loading required libraries.
!pip install geopandas
!pip install altair -U
import pandas as pd
import geopandas as gpd
import numpy as np
import altair as alt
from vega_datasets import data
import math
import json
from IPython.display import Image
For my portfolio, I'll be examining a variety of different election-related phenomena (voting results, demographics, and campaign finance) at the congressional level in Texas. First, before getting to any of the analysis, we'll define and activate a theme to apply to my charts.
# Define and call custom theme to apply to graphs
def custom_theme():
'''
Adapted from https://towardsdatascience.com/consistently-beautiful-
visualizations-with-altair-themes-c7f9f889602
'''
# Font
font = "Calibri"
labelFont = "Calibri"
sourceFont = "Calibri"
# Axes
axisColor = "#000000"
gridColor = "#DEDDDD"
# Main colors palette
main_palette = ["#ffd700", "#ffb14e", "#fa8775", "#ea5f94",
"#cd34b5", "#9d02d7", "#0000ff", "#000000"]
# Sequential palette
sequential_palette = ["#edf8fb", "#bfd3e6", "#9ebcda",
"#8c96c6", "#8856a7", "#810f7c"]
markColor = "#000000"
return {
"width": 685,
"height": 380,
"config": {
"title": {
"fontSize": 18,
"font": font,
"anchor": "start", # left-aligned
"fontColor": "#000000", # black
"subtitleFont": font,
"subtitleFontSize": 12,
"subtitleFontColor": "#000000"
},
"axisX": {
"domain": True,
"domainColor": axisColor,
"domainWidth": 1,
"grid": False,
"labelFont": labelFont,
"labelFontSize": 12,
"labelAngle": 0,
"tickColor": axisColor,
"ticksize": 5, # default, can update
"titleFont": font,
"titleFontSize": 12,
"titlePadding": 10
},
"axisY": {
"domain": False,
"grid": True,
"gridColor": gridColor,
"gridWidth": 1,
"labelFont": labelFont,
"labelFontSize": 12,
"ticks": False,
"titleFont": font,
"titleFontSize": 12,
"titlePadding": 10
},
"legend": {
"labelFont": labelFont,
"labelFontSize": 12,
"titleFont": font,
"titleFontSize": 12,
"title": "", # Default legend title is empty string
}
}
}
alt.themes.register("custom_theme", custom_theme)
alt.themes.enable("custom_theme")
The final visualization I want to make is a plot of political party strength (that is, the relative advantage/disadvantage each party has in terms of seats in the Texas delegation to the US House) over time, plotted against the average DW-NOMINATE score for all Texas representatives to each session of the US Congress dating back to 1900. I'm interested to see how the two parties' positions, both in terms of seats held and expressed ideology, have evolved over time. DW-NOMINATE is an algorithm that unfolds binary vote choice data on roll call votes that legislators cast in order to assign an ideological "score" to each legislator, so that scores place legislators in a mappable space. Legislators with scores that are relatively close to each other are considered to be ideologically similar. DW-NOMINATE's scale runs from -1 (most liberal) to 1 (most conservative). This scaling method was designed by Poole and Rosenthal, and they make data for all legislators in all US Congresses available back to the founding of the nation on Voteview.
I read in the Voteview data, filtered to only Texas representatives, and calculated the total seat advantage for the Republican party in each Congress going back to the 56th, which started in 1899. I also calculated the mean DW-NOMINATE score for all the representatives in the Texas delegation by Congress, so we can see how the legislators representing Texas have changed in their ideological leanings over time.
# Read in data and convert year to datetime
ideology = pd.read_csv('TX_ideo_historical.csv')
ideology['year'] = pd.to_datetime(ideology['year'], format='%Y')
ideology.head()
# Build bar chart of seat advantage metric
seat_advantage = alt.Chart(ideology).mark_bar().encode(
alt.X('rep_majority:Q', axis=alt.Axis(title='Republican seat advantage'),
scale=alt.Scale(domain=[-25, 25])),
alt.Y('year:O',
axis=alt.Axis(formatType='time', format='%Y', title='Session year')
),
color=alt.condition(
alt.datum.rep_majority > 0,
alt.value('#f82222'),
alt.value('#1d64dc')
)
).properties(
width=200,
height=500,
title = {
'text':' ',
'subtitle':'Partisan advantage of seats in US House',
'anchor':'middle',
'align':'center',
'subtitleFont':'Calibri'
}
)
# Build line plot of average ideology
ideo_line = alt.Chart(ideology).mark_line(color='black', strokeWidth=3).encode(
alt.X('avg_ideo:Q',
axis=alt.Axis(title='Average DW-NOMINATE score'),
scale=alt.Scale(domain=[-.5,.5])),
alt.Y('year:O',
axis=alt.Axis(formatType='time', format='%Y', title=None)
)
).properties(
width=200,
height=500,
title = {
'text':" ",
'subtitle':'Average DW-NOMINATE score of delegation',
'anchor':'middle',
'subtitleFont':'Calibri',
'align':'center'
}
)
alt.hconcat(seat_advantage, ideo_line).properties(
title = {
'text':['Democrats historically held the majority of congressional seats in Texas,',
'but Republicans took over in the mid-2000s and moved to the right.']
}
)
Source: Poole & Rosenthal Voteview data.
Although Texas is commonly described as a solidly Republican state, Democrats have historically held the majority of seats in its congressional delegation. Even after the partisan realignment that accompanied the "Southern strategy" throughout the mid- and late-20th century, Democrats held the majority and it wasn't until the 109th Congress in 2005 that Republicans captured the majority of congressional seats in Texas (perhaps aided by former Texas Governor George W. Bush's reelection).
The trend toward Republicans is mirrored in the right-hand side plot, which charts the average DW-NOMINATE score for all members in the Texas congressional delegation. The delegation was generally quite liberal in the early 1990s, but moved to the right into squarely moderate territory throughout the middle and latter part of the century. By the time Republicans took the majority of seats in the 2000s, the average Texas congressperson had a DW-NOMINATE score greater than 0.0, indicating generally conservative beliefs.
One of the main takeaways from this chart is the relative recency of Republicans' strength in Texas, after generational Democratic dominance for more than a century.
Next, I'm going to generate a simple chloropleth map of vote totals for Democratic US House candidates at the county level in 2018. This approach relies on geojson files for counties in Texas sourced from the Texas Natural Resource Information System. Election results come from the Texas Secretary of State's repository of election returns. Specifically, I downloaded all results files for elections 2012-2018, because those elections were held under the current redistricting plan. This analysis does not focus on the congressional district level, but this distinction will be useful for future visualizations.
Before loading the data into Python, I aggregated the election results to the county level (their raw form is at the precinct level). I also calculated percentage measures of voter registration, turnout, and Democratic vote share at the county level for all general elections 2012-2018. Although this analysis focuses on 2018 vote shares and percentage point changes in voter turnout between 2018 and the prior midterm in 2014, these measures can all be used in future maps.
# With thanks to https://medium.com/dataexplorations/creating-choropleth-maps-in-altair-eeb7085779a1
with open('tx_counties.geojson') as json_data:
tx_counties = json.load(json_data)
gdf = gpd.GeoDataFrame.from_features((tx_counties))
# "demperc" is the percent of popular vote for US House received by Democratic candidates
# "vrperc" is the percent of total population registered to vote
# "toperc" is the percent of registered voters that turned out to vote
results = pd.read_csv('house_results_12-18.csv')
results['FIPS'] = results['FIPS'].apply(str)
results['demchange1418'] = results['demperc18'] - results['demperc14']
results['tochange1418'] = results['toperc18'] - results['toperc14']
results.head()
gdf_all = gdf.merge(results, on='FIPS', how='left')
choro_json = json.loads(gdf_all.to_json())
choro_data = alt.Data(values=choro_json['features'])
def yearly_results_choro(choro_data, fillvar, year, legend=True):
'''
This function creates a county chloropleth map of Democratic percentage of vote
in US house elections, using the chloropleth dataset defined above
'''
base = alt.Chart(choro_data).mark_geoshape(
stroke='black',
strokeWidth=1
).encode(
)
if legend:
choro = alt.Chart(choro_data).mark_geoshape(
fill='lightgray',
stroke='black'
).encode(
alt.Fill(fillvar,
type='quantitative',
scale=alt.Scale(domain=[0, .1, .2, .3, .4, .5, .6, .7, .8, .9, 1],
range=["#a80000", "#c21b18", "#d72f30",
"#d75d5d", "#e27f7f", "#a5b0ff",
"#799be2", "#6674d3", "#584cde",
"#0f38bf"]),
legend=alt.Legend(title=["Democrats' share", 'of votes received'],
orient='bottom-left', padding=10, fillColor='white',
strokeColor='black', format='%',
values=[0, .25, 0.5, .75, 1],
titleLineHeight=12, titleLimit=100))
)
else:
choro = alt.Chart(choro_data).mark_geoshape(
fill='lightgray',
stroke='black'
).encode(
alt.Fill(fillvar,
type='quantitative',
scale=alt.Scale(domain=[0, .1, .2, .3, .4, .5, .6, .7, .8, .9, 1],
range=["#a80000", "#c21b18", "#d72f30",
"#d75d5d", "#e27f7f", "#a5b0ff",
"#799be2", "#6674d3", "#584cde",
"#0f38bf"]),
legend=None)
)
results = (base + choro).properties(
width=400,
height=400,
title={
'text':' ',
'subtitle':year,
'anchor':'middle',
'subtitleFontWeight':'bold',
'subtitleFontSize':14
}
)
return results
results14 = yearly_results_choro(choro_data, 'properties.demperc14', '2014', legend=False)
results18 = yearly_results_choro(choro_data, 'properties.demperc18', '2018')
results_midterms = (results14 | results18).configure(
background='#FFFFFF',
autosize='pad',
).properties(
title={
'text':'Democratic performance improved in urban areas from the 2014 to 2018 midterms',
'subtitle':'Percentage Democratic vote in US House races by county',
'subtitleFont':'Calibri'
}
)
#results_midterms.configure_view(strokeOpacity=0)
Image('2_results14_18.png', width=800)
Source: Texas Secretary of State election results.
This first map demonstrates that Democrats performed well in urban areas including Houston, Dallas, Austin, San Antonio and El Paso. Meanwhile, Republicans performed well in rural and exurban areas. Crucially, a comparison of the maps for the 2018 and 2014 midterms shows that Democrats improved their performance in terms of vote share in some key high-population suburban counties, especially those bordering Harris County (Houston), Dallas County, and Travis County (Austin).
Next, we'll examine the change in turnout between the 2014 and 2018 midterms.
gdf_all['tochange1418'].describe()
# Turnout
base_to = alt.Chart(choro_data).mark_geoshape(
stroke='black',
strokeWidth=1
).encode(
)
choro_to = alt.Chart(choro_data).mark_geoshape(
fill='lightgray',
stroke='black'
).encode(
alt.Fill('properties.tochange1418',
type='quantitative',
scale=alt.Scale(domain=[-.1, 0, .1, .2, .3, .4, .5, .6, .7],
# Scale drawn from Color Brewer
range=["#ffffe5", "#f7fcb9", "#d9f0a3", "#addd8e",
"#78c679", "#41ab5d", "#238443", "#005a32"]),
legend=alt.Legend(title=["Change in", "turnout"],
orient='bottom-left', padding=5, fillColor='white',
strokeColor='black', format='%',
values=[-.1, 0, .25, .5, .75],
titleLineHeight=12, titleLimit=100))
)
change_to = (base_to + choro_to).properties(
width=500,
height=500,
title={
'text':'Percentage change in overall turnout',
'subtitle':'Between midterm elections for US House, 2014-2018',
'subtitleFont':'Calibri'
}
).configure(
background='#FFFFFF'
)
change_to.configure_view(strokeOpacity=0)
Source: Texas Secretary of State election results.
It appears that, with only a few exceptions, turnout was up all over the state of Texas. The dark green counties correspond to Texas' 4th congressional district: the Republican incumbent John Ratcliffe ran there unopposed in 2014, so turnout was greatly depressed as the race was uncontested. Turnout in 2018 was up more than 50% in those counties, so plotting raw turnout change is a bit misleading with the color encoding with respect to the other counties in Texas. To get a better idea of the statewide pattern of turnout change without those outlier counties throwing them off, let's set all the counties in that congressional district to equal the max turnout change of all other counties in the state, then redo the map.
countycheck = gdf_all[['county', 'tochange1418']]
countycheck.sort_values(by='tochange1418', ascending=False, inplace=True)
countycheck.head(20)
# Update counties in TX-4 to max turnout change in rest of TX
counties_to_update = countycheck['county'][:16].to_list()
gdf_all.loc[gdf_all['county'].isin(counties_to_update), 'tochange1418'] = 0.324939
# Repeat the visualization process for turnout change
choro_json = json.loads(gdf_all.to_json())
choro_data = alt.Data(values=choro_json['features'])
# Turnout
base_to = alt.Chart(choro_data).mark_geoshape(
stroke='black',
strokeWidth=1
).encode(
)
choro_to = alt.Chart(choro_data).mark_geoshape(
fill='lightgray',
stroke='black'
).encode(
alt.Fill('properties.tochange1418',
type='quantitative',
scale=alt.Scale(domain=[-.1, -.05, 0, .05, .1, .15, .2, .25, .3, .35],
range=["#ffffe5", "#f7fcb9", "#d9f0a3", "#addd8e",
"#78c679", "#41ab5d", "#238443", "#005a32"]),
legend=alt.Legend(title="Change in turnout",
orient='bottom-left', padding=5, fillColor='white',
strokeColor='black', format='%',
values=[-.1, 0, .1, .2],
titleLineHeight=12, titleLimit=100))
)
change_to = (base_to + choro_to).properties(
width=450,
height=450,
title={
'text':'Percentage change in overall turnout',
'subtitle':'Between midterm elections for US House, 2014-2018',
'subtitleFont':'Calibri'
}
).configure(
background='#FFFFFF'
)
change_to.configure_view(strokeOpacity=0)
Source: Texas Secretary of State election results
This updated map gives a fuller appreciation for the extent to which turnout was up across the entire state, save for some smaller rural counties in southern and western parts of the state. To get a better idea of how trends differed across different parts of Texas, let's zoom in on three areas in particular: the Dallas and Houston metro areas, and the small rural counties that make up the 23rd congressional district (a largely rural area in southwest Texas that borders Mexico).
houston_counties = ['Austin County', 'Brazoria County', 'Chambers County',
'Fort Bend County', 'Galveston County', 'Harris County',
'Liberty County', 'Montgomery County', 'Waller County']
dallas_counties = ['Hood County', 'Johnson County', 'Parker County',
'Somervell County', 'Tarrant County', 'Wise County',
'Collin County', 'Dallas County', 'Denton County',
'Ellis County', 'Hunt County', 'Kaufman County',
'Rockwall County']
tx23_counties = ['Val Verde', 'Dimmit', 'Culberson', 'Hudspeth', 'Maverick',
'Medina', 'La Salle', 'Ward', 'Jeff Davis', 'Winkler', 'Schleicher',
'Uvalde', 'Frio', 'Pecos', 'Zavala', 'Crockett', 'Brewster',
'Loving', 'Reagan', 'Kinney', 'Crane', 'Terrell', 'Presidio',
'Edwards', 'Reeves',]
houston_gdf = gdf_all[gdf_all['COUNTY'].isin(houston_counties)]
dallas_gdf = gdf_all[gdf_all['COUNTY'].isin(dallas_counties)]
tx23_gdf = gdf_all[gdf_all['county'].isin(tx23_counties)]
houston_gdf = houston_gdf.sort_values(by='county')
choro_json_hou = json.loads(houston_gdf.to_json())
choro_data_hou = alt.Data(values=choro_json_hou['features'])
base_hou = alt.Chart(choro_data_hou).mark_geoshape(
stroke='black',
strokeWidth=1
).encode(
).properties(
width=400,
height=400
)
choro_hou = alt.Chart(choro_data_hou).mark_geoshape(
fill='lightgray',
stroke='black'
).encode(
alt.Fill('properties.demchange1418',
type='quantitative',
scale=alt.Scale(domain=[0, .05, .1, .15, .2, .25],
range=["#a5b0ff", "#799be2", "#6674d3",
"#584cde", "#0f38bf"]),
legend=alt.Legend(title=["Change in", 'Dem voteshare'],
orient='bottom-left', padding=10, fillColor='white',
strokeColor='black', format='%',
values=[0, .1, .25],
titleLineHeight=12, titleLimit=100))
)
results_hou = (base_hou + choro_hou).properties(
width=400,
height=400,
title={
'text':' ',
'subtitle':'Change in Democratic vote share, 2014-2018',
'anchor':'middle',
'subtitleFontSize':12,
'subtitleFontWeight':'bold',
'subtitleFont':'Calibri'
}
)
choro_hou_to = alt.Chart(choro_data_hou).mark_geoshape(
fill='lightgray',
stroke='black'
).encode(
alt.Fill('properties.tochange1418',
type='quantitative',
scale=alt.Scale(domain=[0, .05, .1, .15, .2, .25, .3],
range=["#f7fcb9", "#d9f0a3", "#addd8e",
"#78c679", "#41ab5d", "#238443"]),
legend=alt.Legend(title=["Change in", 'turnout'],
orient='bottom-right', padding=10, fillColor='white',
strokeColor='black', format='%',
values=[0, .1, .25],
titleLineHeight=12, titleLimit=100))
)
to_hou = (base_hou + choro_hou_to).properties(
width=400,
height=400,
title={
'text':' ',
'subtitle':'Change in overall turnout, 2014-2018',
'anchor':'middle',
'subtitleFontSize':12,
'subtitleFontWeight':'bold',
'subtitleFont':'Calibri'
}
)
houston_area = alt.hconcat(results_hou, to_hou).configure(
background='#FFFFFF',
autosize='pad'
).resolve_scale(
fill='independent'
).properties(
title={
'text':'Change in Democratic vote share and total turnout, Houston-area counties',
'subtitle':'For US House races in 2014 and 2018 midterms',
'subtitleFont':'Calibri'
}
).configure_view(strokeOpacity=0)
Image('2_houston.png', width=900)
Source: Texas Secretary of State election results
We see that both turnout and the Democratic share of the vote was uniformly up across the 9 counties that make up the Houston MSA from the 2014 midterms to the 2018 midterms. Montgomery County had the highest change in vote share for the Democratic House candidate (23.5%), while Fort Bend saw the greatest change in turnout (23.8%)
dallas_gdf = dallas_gdf.sort_values(by='county')
choro_json_dal = json.loads(dallas_gdf.to_json())
choro_data_dal = alt.Data(values=choro_json_dal['features'])
base_dal = alt.Chart(choro_data_dal).mark_geoshape(
stroke='black',
strokeWidth=1
).encode(
).properties(
width=400,
height=400
)
choro_dal = alt.Chart(choro_data_dal).mark_geoshape(
fill='lightgray',
stroke='black'
).encode(
alt.Fill('properties.demchange1418',
type='quantitative',
scale=alt.Scale(domain=[0, .05, .1, .15, .2, .25, .3],
range=["#a5b0ff", "#799be2", "#6674d3",
"#584cde", "#0f38bf", "#0b2a8e"]),
legend=alt.Legend(title=["Change in", 'Dem voteshare'],
orient='bottom-right', padding=10, fillColor='white',
strokeColor='black', format='%',
values=[0, .1, .25],
titleLineHeight=12, titleLimit=100))
)
results_dal = (base_dal + choro_dal).properties(
width=400,
height=400,
title={
'text':' ',
'subtitle':'Change in Democratic vote share, 2014-2018',
'anchor':'middle',
'subtitleFontSize':12,
'subtitleFontWeight':'bold',
'subtitleFont':'Calibri'
}
)
choro_dal_to = alt.Chart(choro_data_dal).mark_geoshape(
fill='lightgray',
stroke='black'
).encode(
alt.Fill('properties.tochange1418',
type='quantitative',
scale=alt.Scale(domain=[0, .05, .1, .15, .2, .25, .3, .35],
range=["#f7fcb9", "#d9f0a3", "#addd8e",
"#78c679", "#41ab5d", "#238443", "#005a32"]),
legend=alt.Legend(title=["Change in", 'turnout'],
orient='bottom-right', padding=10, fillColor='white',
strokeColor='black', format='%',
values=[0, .1, .25],
titleLineHeight=12, titleLimit=100))
)
to_dal = (base_dal + choro_dal_to).properties(
width=400,
height=400,
title={
'text':' ',
'subtitle':'Change in overall turnout, 2014-2018',
'anchor':'middle',
'subtitleFontSize':12,
'subtitleFontWeight':'bold',
'subtitleFont':'Calibri'
}
)
dallas_area = alt.hconcat(results_dal, to_dal).configure(
background='#FFFFFF',
autosize='pad'
).resolve_scale(
fill='independent'
).properties(
title={
'text':'Change in Democratic vote share and total turnout, Dallas-area counties',
'subtitle':'For US House races in 2014 and 2018 midterms',
'subtitleFont':'Calibri'
}
).configure_view(strokeOpacity=0)
Image('2_dallas.png', width=900)
Source: Texas Secretary of State election results
The same pattern that we observed in Houston holds true in Dallas: the city of Dallas and its surrounding suburbs (the 12 counties in the MSA) observed uniformly higher turnout and Democratic vote share across the board. These visualizations help explain Democrats' pickups in districts like the 32nd (Colin Allred's suburban Dallas district). Kaufman County saw the largest swing in vote share towards the Democrats (29.1%) while Hunt and Rockwall Counties saw the largest gain in voter turnout (about 32.4%).
tx23_gdf = tx23_gdf.sort_values(by='county')
choro_json_tx23 = json.loads(tx23_gdf.to_json())
choro_data_tx23 = alt.Data(values=choro_json_tx23['features'])
base_tx23 = alt.Chart(choro_data_tx23).mark_geoshape(
stroke='black',
strokeWidth=1
).encode(
).properties(
width=400,
height=400
)
choro_tx23 = alt.Chart(choro_data_tx23).mark_geoshape(
fill='lightgray',
stroke='black'
).encode(
alt.Fill('properties.demchange1418',
type='quantitative',
scale=alt.Scale(domain=[-.25, -.2, -.15, -.1, -.05, 0, .05],
range=["#a80000", "#c21b18", "#d72f30",
"#d75d5d", "#e27f7f", "#a5b0ff"]),
legend=alt.Legend(title=["Change in", 'Dem voteshare'],
orient='bottom-left', padding=10, fillColor='white',
strokeColor='black', format='%',
values=[-.2, -.1, 0],
titleLineHeight=12, titleLimit=100))
)
results_tx23 = (base_tx23 + choro_tx23).properties(
width=400,
height=400,
title={
'text':' ',
'subtitle':'Change in Democratic vote share, 2014-2018',
'anchor':'middle',
'subtitleFontSize':12,
'subtitleFontWeight':'bold',
'subtitleFont':'Calibri'
}
)
choro_tx23_to = alt.Chart(choro_data_tx23).mark_geoshape(
fill='lightgray',
stroke='black'
).encode(
alt.Fill('properties.tochange1418',
type='quantitative',
scale=alt.Scale(domain=[-.1, -.05, 0, .05, .1, .15, .2, .25, .3, .35],
range=["#a6dba0", "#d9f0d3", "#ffffe5", "#fff7bc", "#fee391",
"#fec44f", "#fe9929", "#ec7014", "#cc4c02"]),
legend=alt.Legend(title=["Change in", 'turnout'],
orient='bottom-right', padding=10, fillColor='white',
strokeColor='black', format='%',
values=[-.1, 0, .1, .25],
titleLineHeight=12, titleLimit=100))
)
to_tx23 = (base_tx23 + choro_tx23_to).properties(
width=400,
height=400,
title={
'text':' ',
'subtitle':'Change in overall turnout, 2014-2018',
'anchor':'middle',
'subtitleFontSize':12,
'subtitleFontWeight':'bold',
'subtitleFont':'Calibri'
}
)
tx23_area = alt.hconcat(results_tx23, to_tx23).configure(
background='#FFFFFF',
autosize='pad'
).resolve_scale(
fill='independent'
).properties(
title={
'text':'Change in Democratic vote share and total turnout, TX-23 counties',
'subtitle':'For US House races in 2014 and 2018 midterms',
'subtitleFont':'Calibri'
}
).configure_view(strokeOpacity=0)
Image('2_tx23.png', width=900)
Source: Texas Secretary of State election results
Interestingly, Democrats performed worse in counties across Texas' 23rd congressional district, despite nearly unseating Republican incumbent Will Hurd. Turnout was up in this largely rural and Latino district, though not nearly to the extent that it increased in the metro Dallas and Houston areas. Edwards County was one of the few counties across the entire state that saw a decline in voter turnout between 2014 and 2018.
Next, I'll investigate by-county demographics of the same three subareas we investigated in Part 1, to see whether any of the electoral/turnout trends correlate with the demographic composition of these areas. The data for this segment come from the Census Bureau's American Community Survey (ACS), with 5-year estimates produced in 2018. These data include information on race and age at the county level.
# Load total population by county
countypop = pd.read_csv('countypop.csv', dtype={'FIPS':str})
# Load demographic datasets
acs_race = pd.read_csv('race_long.csv', dtype={'FIPS':str})
acs_age = pd.read_csv('age_long.csv', dtype={'FIPS':str})
acs_gender = pd.read_csv('gender_long.csv', dtype={'FIPS':str})
houston_counties = ['Austin', 'Brazoria', 'Chambers', 'Fort Bend', 'Galveston', 'Harris',
'Liberty', 'Montgomery', 'Waller']
race_hou = acs_race[acs_race['COUNTY'].isin(houston_counties)]
race_hou = race_hou.merge(countypop, how='left', on=['FIPS', 'COUNTY'])
race_hou.loc[:, 'PERC'] = race_hou['COUNT'] / race_hou['TOTAL_POP']
white_pop_rank_hou = race_hou[race_hou['RACE']=='White'] \
.sort_values(by='PERC', ascending=False)[['COUNTY', 'FIPS', 'PERC']] \
.rename(columns={'PERC':'PERC_WHITE'})
white_pop_rank_hou['RANK_WHITE'] = np.arange(1, len(white_pop_rank_hou)+1)
race_hou = race_hou.merge(white_pop_rank_hou, how='left', on=['FIPS', 'COUNTY'])
houston_race = alt.Chart(race_hou).mark_bar().encode(
x=alt.X('sum(COUNT)', stack='normalize',
axis=alt.Axis(title='Percent of population', format='%')),
y=alt.Y('COUNTY:N',
sort=alt.EncodingSortField(field='RANK_WHITE'),
axis=alt.Axis(title='County')),
color=alt.Color('RACE:N',
scale=alt.Scale(range=['#ffd700', '#ffb14e', '#fa8775',
'#ea5f94', '#cd34b5', '#9d02d7', '#7A0862']),
legend=alt.Legend(title=None, orient='right',
padding=5, fillColor='white', strokeColor='black'))
).properties(
width=400,
title={
'text':' ',
'subtitle':'By racial composition',
'anchor':'middle',
'align':'center',
'subtitleFont':'Calibri',
'subtitleFontWeight':'bold'
})
age_hou = acs_age[acs_age['COUNTY'].isin(houston_counties)]
age_hou = age_hou.merge(countypop, how='left', on=['FIPS', 'COUNTY'])
age_hou.loc[:, 'PERC'] = age_hou['COUNT'] / age_hou['TOTAL_POP']
u20_pop_rank_hou = age_hou[age_hou['AGE']=='19 and under'] \
.sort_values(by='PERC', ascending=False)[['COUNTY', 'FIPS', 'PERC']] \
.rename(columns={'PERC':'PERC_19'})
u20_pop_rank_hou['RANK_19'] = np.arange(1, len(u20_pop_rank_hou)+1)
age_hou = age_hou.merge(u20_pop_rank_hou, how='left', on=['FIPS', 'COUNTY'])
houston_age = alt.Chart(age_hou).mark_bar().encode(
x=alt.X('sum(COUNT)', stack='normalize',
axis=alt.Axis(title='Percent of population', format='%')),
y=alt.Y('COUNTY:N',
sort=alt.EncodingSortField(field='RANK_19'),
axis=alt.Axis(title='County')),
color=alt.Color('AGE:N',
scale=alt.Scale(range=['#a6611a', '#dfc27d', '#f5f5f5',
'#80cdc1', '#018571']),
legend=alt.Legend(title=None, orient='right',
padding=5, fillColor='white', strokeColor='black'))
).properties(
width=400,
title={
'text':' ',
'subtitle':'By age composition',
'anchor':'middle',
'align':'center',
'subtitleFont':'Calibri',
'subtitleFontWeight':'bold'
})
houston_demo = alt.vconcat(houston_race, houston_age).resolve_scale(color='independent')
gdf_all.loc[:, 'houston_area'] = "no"
gdf_all.loc[gdf_all['county'].isin(houston_counties), 'houston_area'] = "Houston MSA county"
choro_json = json.loads(gdf_all.to_json())
choro_data = alt.Data(values=choro_json['features'])
base_hou = alt.Chart(choro_data).mark_geoshape(
stroke='gray',
strokeWidth=1
).encode(
)
choro_hou = alt.Chart(choro_data).mark_geoshape(
fill='white',
stroke='gray'
).encode(
alt.Fill('properties.houston_area',
type='ordinal',
scale=alt.Scale(domain=["Houston MSA county", "no"],
range=["#1CA517", "#ffffff"]),
legend=alt.Legend(title=None, orient='none', values=['Houston MSA county'],
padding=5, fillColor='white', strokeColor='black',
legendX=640, legendY=175))
)
houston_highlight = (base_hou + choro_hou).properties(
width=200,
height=222
).properties(
title={
'text':' ',
'subtitle':'Houston MSA counties',
'subtitleFont':'Calibri',
'subtitleFontWeight':'bold'
}
)
houston_county_change = alt.Chart(houston_gdf).mark_bar().encode(
x=alt.X('demchange1418:Q', axis=alt.Axis(title='Percent change in Dem voteshare', format='%')),
y=alt.Y('county:N', axis=alt.Axis(title='County'), sort='-x'),
color=alt.condition(
alt.datum.demchange1418 > 0,
alt.value("#0f38bf"),
alt.value("#a80000")
),
opacity=alt.Opacity('demchange1418:Q', legend=None)
).properties(
width=200,
title={
'text':' ',
'subtitle': 'Change in Democratic vote share, 2014-2018 midterms',
'subtitleFont':'Calibri',
'subtitleFontWeight':'bold'
})
houston_right = alt.vconcat(houston_highlight, houston_county_change).resolve_scale(
color='independent',
opacity='independent'
)
full_houston = alt.hconcat(houston_demo, houston_right).configure(
background='#FFFFFF',
autosize='pad'
).resolve_scale(
color='independent'
).properties(
title={
'text':'Demographic composition of Houston-area counties, 2018 5-year ACS',
'subtitle':'With geographic highlight and change in Democratic vote share, 2014-2018',
'subtitleFont':'Calibri'
}
).configure_view(strokeOpacity=0)
full_houston
Sources: Census Bureau 5-year ACS estimates (2018) and Texas Secretary of State election results
Counties in the Houston MSA tend to be young and diverse. In particular, Harris County, the county in which the city of Houston is located (and the highest-population county in Texas), is nearly 50% under the age of 35, and more than 50% nonwhite. Montgomery County is the Houston-area county that saw the greatest swing in vote share to the Democrats from 2014-2018, and the demographics of this county are majority white, which suggests that Democratic gains in suburban areas were not driven entirely by voters of color.
dallas_counties = ['Hood', 'Johnson', 'Parker', 'Somervell', 'Tarrant', 'Wise',
'Collin', 'Dallas', 'Denton', 'Ellis', 'Hunt', 'Kaufman',
'Rockwall']
race_dal = acs_race[acs_race['COUNTY'].isin(dallas_counties)]
race_dal = race_dal.merge(countypop, how='left', on=['FIPS', 'COUNTY'])
race_dal.loc[:, 'PERC'] = race_dal['COUNT'] / race_dal['TOTAL_POP']
white_pop_rank_dal = race_dal[race_dal['RACE']=='White'] \
.sort_values(by='PERC', ascending=False)[['COUNTY', 'FIPS', 'PERC']] \
.rename(columns={'PERC':'PERC_WHITE'})
white_pop_rank_dal['RANK_WHITE'] = np.arange(1, len(white_pop_rank_dal)+1)
race_dal = race_dal.merge(white_pop_rank_dal, how='left', on=['FIPS', 'COUNTY'])
dallas_race = alt.Chart(race_dal).mark_bar().encode(
x=alt.X('sum(COUNT)', stack='normalize',
axis=alt.Axis(title='Percent of population', format='%')),
y=alt.Y('COUNTY:N',
sort=alt.EncodingSortField(field='RANK_WHITE'),
axis=alt.Axis(title='County')),
color=alt.Color('RACE:N',
scale=alt.Scale(range=['#ffd700', '#ffb14e', '#fa8775',
'#ea5f94', '#cd34b5', '#9d02d7', '#7A0862']),
legend=alt.Legend(title=None, orient='right',
padding=5, fillColor='white', strokeColor='black'))
).properties(
width=400,
title={
'text':' ',
'subtitle':'By racial composition',
'subtitleFontWeight':'bold',
'subtitleFont':'Calibri',
'anchor':'middle',
'align':'center'
})
age_dal = acs_age[acs_age['COUNTY'].isin(dallas_counties)]
age_dal = age_dal.merge(countypop, how='left', on=['FIPS', 'COUNTY'])
age_dal.loc[:, 'PERC'] = age_dal['COUNT'] / age_dal['TOTAL_POP']
u20_pop_rank_dal = age_dal[age_dal['AGE']=='19 and under'] \
.sort_values(by='PERC', ascending=False)[['COUNTY', 'FIPS', 'PERC']] \
.rename(columns={'PERC':'PERC_19'})
u20_pop_rank_dal['RANK_19'] = np.arange(1, len(u20_pop_rank_dal)+1)
age_dal = age_dal.merge(u20_pop_rank_dal, how='left', on=['FIPS', 'COUNTY'])
dallas_age = alt.Chart(age_dal).mark_bar().encode(
x=alt.X('sum(COUNT)', stack='normalize',
axis=alt.Axis(title='Percent of population', format='%')),
y=alt.Y('COUNTY:N',
sort=alt.EncodingSortField(field='RANK_19'),
axis=alt.Axis(title='County')),
color=alt.Color('AGE:N',
scale=alt.Scale(range=['#a6611a', '#dfc27d', '#f5f5f5',
'#80cdc1', '#018571']),
legend=alt.Legend(title=None, orient='right',
padding=5, fillColor='white', strokeColor='black'))
).properties(
width=400,
title={
'text':' ',
'subtitle':'By age composition',
'subtitleFontWeight':'bold',
'subtitleFont':'Calibri',
'anchor':'middle',
'align':'center'
})
dallas_demo = alt.vconcat(dallas_race, dallas_age).resolve_scale(
color='independent'
)
gdf_all.loc[:, 'dallas_area'] = "no"
gdf_all.loc[gdf_all['county'].isin(dallas_counties), 'dallas_area'] = "Dallas MSA county"
choro_json = json.loads(gdf_all.to_json())
choro_data = alt.Data(values=choro_json['features'])
base_dal = alt.Chart(choro_data).mark_geoshape(
stroke='gray',
strokeWidth=1
).encode(
)
choro_dal = alt.Chart(choro_data).mark_geoshape(
fill='white',
stroke='gray'
).encode(
alt.Fill('properties.dallas_area',
type='ordinal',
scale=alt.Scale(domain=["Dallas MSA county", "no"],
range=["#1CA517", "#ffffff"]),
legend=alt.Legend(title=None, orient='none', values=['Dallas MSA county'],
padding=5, fillColor='white', strokeColor='black',
legendX=720, legendY=230))
)
dallas_highlight = (base_dal + choro_dal).properties(
width=250,
height=300
).properties(
title={
'text':' ',
'subtitle':'Dallas MSA counties',
'subtitleFont':'Calibri',
'subtitleFontWeight':'bold'
}
)
dallas_county_change = alt.Chart(dallas_gdf).mark_bar().encode(
x=alt.X('demchange1418:Q', axis=alt.Axis(title='Percent change in Dem voteshare', format='%')),
y=alt.Y('county:N', axis=alt.Axis(title='County'), sort='-x'),
color=alt.condition(
alt.datum.demchange1418 > 0,
alt.value("#0f38bf"),
alt.value("#a80000")
),
opacity=alt.Opacity('demchange1418:Q', legend=None)
).properties(
width=250,
title={
'text':' ',
'subtitle': 'Change in Democratic vote share, 2014-2018 midterms',
'subtitleFont':'Calibri',
'subtitleFontWeight':'bold'
})
dallas_right = alt.vconcat(dallas_highlight, dallas_county_change).resolve_scale(
color='independent',
opacity='independent'
)
full_dallas = alt.hconcat(dallas_demo, dallas_right).configure(
background='#FFFFFF',
autosize='pad'
).resolve_scale(
color='independent'
).properties(
title={
'text':'Demographic composition of Dallas-area counties, 2018 5-year ACS',
'subtitle':'With geographic highlight and change in Democratic vote share, 2014-2018',
'subtitleFont':'Calibri'
}
).configure_view(strokeOpacity=0)
full_dallas
Sources: Census Bureau 5-year ACS estimates (2018) and Texas Secretary of State election results
We see patterns in the Dallas area's demographics that are similar to Houston's: the largest county (Dallas itself) is majority nonwhite and about 50% younger than 35. Tarrant County (Fort Worth) is also more than 50% nonwhite. On the other hand, many of the surrounding counties are more white, and it is these counties (especially Kaufman, Rockwall and Hunt) that experienced the largest shift towards Democrats in the US House vote in 2018. This suggests that Democrats already performed strongly in more diverse, urban counties like Dallas and Tarrant, but they had much more upside and room to improve in whiter, suburban counties close to city centers.
race_tx23 = acs_race[acs_race['COUNTY'].isin(tx23_counties)]
race_tx23 = race_tx23.merge(countypop, how='left', on=['FIPS', 'COUNTY'])
race_tx23.loc[:, 'PERC'] = race_tx23['COUNT'] / race_tx23['TOTAL_POP']
white_pop_rank_tx23 = race_tx23[race_tx23['RACE']=='White'] \
.sort_values(by='PERC', ascending=False)[['COUNTY', 'FIPS', 'PERC']] \
.rename(columns={'PERC':'PERC_WHITE'})
white_pop_rank_tx23['RANK_WHITE'] = np.arange(1, len(white_pop_rank_tx23)+1)
race_tx23 = race_tx23.merge(white_pop_rank_tx23, how='left', on=['FIPS', 'COUNTY'])
tx23_race = alt.Chart(race_tx23).mark_bar().encode(
x=alt.X('sum(COUNT)', stack='normalize',
axis=alt.Axis(title='Percent of population', format='%')),
y=alt.Y('COUNTY:N',
sort=alt.EncodingSortField(field='RANK_WHITE'),
axis=alt.Axis(title='County')),
color=alt.Color('RACE:N',
scale=alt.Scale(range=['#ffd700', '#ffb14e', '#fa8775',
'#ea5f94', '#cd34b5', '#9d02d7', '#7A0862']),
legend=alt.Legend(title=None, orient='right',
padding=5, fillColor='white', strokeColor='black'))
).properties(
width=400,
title={
'text':' ',
'subtitle':'By racial composition',
'subtitleFont':'Calibri',
'subtitleFontWeight':'bold',
'anchor':'middle',
'align':'center'
})
age_tx23 = acs_age[acs_age['COUNTY'].isin(tx23_counties)]
age_tx23 = age_tx23.merge(countypop, how='left', on=['FIPS', 'COUNTY'])
age_tx23.loc[:, 'PERC'] = age_tx23['COUNT'] / age_tx23['TOTAL_POP']
u20_pop_rank_tx23 = age_tx23[age_tx23['AGE']=='19 and under'] \
.sort_values(by='PERC', ascending=False)[['COUNTY', 'FIPS', 'PERC']] \
.rename(columns={'PERC':'PERC_19'})
u20_pop_rank_tx23['RANK_19'] = np.arange(1, len(u20_pop_rank_tx23)+1)
age_tx23 = age_tx23.merge(u20_pop_rank_tx23, how='left', on=['FIPS', 'COUNTY'])
tx23_age = alt.Chart(age_tx23).mark_bar().encode(
x=alt.X('sum(COUNT)', stack='normalize',
axis=alt.Axis(title='Percent of population', format='%')),
y=alt.Y('COUNTY:N',
sort=alt.EncodingSortField(field='RANK_19'),
axis=alt.Axis(title='County')),
color=alt.Color('AGE:N',
scale=alt.Scale(range=['#a6611a', '#dfc27d', '#f5f5f5',
'#80cdc1', '#018571']),
legend=alt.Legend(title=None, orient='right',
padding=5, fillColor='white', strokeColor='black'))
).properties(
width=400,
title={
'text':' ',
'subtitle':'By age composition',
'subtitleFont':'Calibri',
'subtitleFontWeight':'bold',
'anchor':'middle',
'align':'center'
})
tx23_demo = alt.vconcat(tx23_race, tx23_age).resolve_scale(
color='independent'
)
gdf_all.loc[:, 'tx23_area'] = "no"
gdf_all.loc[gdf_all['county'].isin(tx23_counties), 'tx23_area'] = "TX-23 county"
choro_json = json.loads(gdf_all.to_json())
choro_data = alt.Data(values=choro_json['features'])
base_tx23 = alt.Chart(choro_data).mark_geoshape(
stroke='gray',
strokeWidth=1
).encode(
)
choro_tx23 = alt.Chart(choro_data).mark_geoshape(
fill='white',
stroke='gray'
).encode(
alt.Fill('properties.tx23_area',
type='ordinal',
scale=alt.Scale(domain=["TX-23 county", "no"],
range=["#1CA517", "#ffffff"]),
legend=alt.Legend(title=None, orient='none', values=['TX-23 county'],
padding=5, fillColor='white', strokeColor='black',
legendX=720, legendY=370))
)
tx23_highlight = (base_tx23 + choro_tx23).properties(
width=350,
height=540
).properties(
title={
'text':' ',
'subtitle':'TX-23 counties',
'subtitleFont':'Calibri',
'subtitleFontWeight':'bold',
'anchor':'middle',
'align':'center'
}
)
tx23_county_change = alt.Chart(tx23_gdf).mark_bar().encode(
x=alt.X('demchange1418:Q', axis=alt.Axis(title='Percent change in Dem voteshare', format='%')),
y=alt.Y('county:N', axis=alt.Axis(title='County'), sort='-x'),
color=alt.condition(
alt.datum.demchange1418 > 0,
alt.value("#0f38bf"),
alt.value("#a80000")
),
opacity=alt.Opacity('demchange1418:Q', legend=None)
).properties(
width=350,
title={
'text':' ',
'subtitle': 'Change in Democratic vote share, 2014-2018 midterms',
'subtitleFont':'Calibri',
'subtitleFontWeight':'bold',
'anchor':'middle',
'align':'center'
})
tx23_right = alt.vconcat(tx23_highlight, tx23_county_change).resolve_scale(
color='independent',
opacity='independent'
)
full_tx23 = alt.hconcat(tx23_demo, tx23_right).configure(
background='#FFFFFF',
autosize='pad'
).resolve_scale(
color='independent'
).properties(
title={
'text':'Demographic composition of TX-23 counties, 2018 5-year ACS',
'subtitle':'With geographic highlight and change in Democratic vote share, 2014-2018',
'subtitleFont':'Calibri'
}
).configure_view(strokeOpacity=0)
full_tx23
Sources: Census Bureau 5-year ACS estimates (2018) and Texas Secretary of State election results
In a complete reversal of the trend observed in the Dallas and Houston areas, 22 of the 25 counties in Texas' 23rd congressional district shifted towards the Republicans in the 2018 midterms. Intriguingly, most of the counties in the district are majority Hispanic, which runs counter to the prevailing narrative that Republicans don't perform well with Hispanic voters. There are a few reasons this could be the case: Will Hurd is a popular incumbent. The district is also highly rural, and it could be that the urban/rural divide is more important in predicting partisan support than the race of the legislator representing the district.
Still, it's interesting to note that even in the Dallas and Houston areas, most of the counties that saw a surge in Democratic support were majority white, and the opposite holds true in the 23rd district. Pecos County, which swing towards the Republicans by more than 20%, is less than 30% white.
The next question I'll look at, to expound on the relationship explored in Part 3, is a scatter plot of all counties in Texas showing the relationship between county median age, proportion of county population that is white, and votes for US House in the last 2 midterm elections (2018 and 2014). Midterm elections are most directly comparable to each other because they are distinct from presidential election years in terms of voter enthusiasm and participation. I'm curious to see whether the conventional wisdom that Democrats do better in younger, nonwhite and urban areas holds true in the data.
results_and_pop = results.rename(columns={'county':'COUNTY'})
results_and_pop = results_and_pop.merge(countypop, how='left', on=['FIPS', 'COUNTY'])
results_and_pop.head()
# Merge results with race info, calculate proportion of county that is white
part4_results_demos = results_and_pop.merge(acs_race, how='left', on=['COUNTY', 'FIPS'])
part4_results_demos['PERC_WHITE'] = part4_results_demos['COUNT'] / part4_results_demos['TOTAL_POP']
# Filter to include only 1 row per county, merge with age
part4_results_demos = part4_results_demos[part4_results_demos['RACE'] == 'White']
part4_results_demos = part4_results_demos.merge(
acs_age[acs_age['AGE']=='19 and under'][['FIPS', 'COUNTY', 'MEDIAN_AGE']],
how='left', on = ['COUNTY', 'FIPS'])
# Generate categorical variable that shows which party won the vote in a given county
part4_results_demos.loc[part4_results_demos['demperc12']>=.5, 'WIN12'] = 'Democrat'
part4_results_demos.loc[part4_results_demos['demperc12']<.5, 'WIN12'] = 'Republican'
part4_results_demos.loc[part4_results_demos['demperc14']>=.5, 'WIN14'] = 'Democrat'
part4_results_demos.loc[part4_results_demos['demperc14']<.5, 'WIN14'] = 'Republican'
part4_results_demos.loc[part4_results_demos['demperc16']>=.5, 'WIN16'] = 'Democrat'
part4_results_demos.loc[part4_results_demos['demperc16']<.5, 'WIN16'] = 'Republican'
part4_results_demos.loc[part4_results_demos['demperc18']>=.5, 'WIN18'] = 'Democrat'
part4_results_demos.loc[part4_results_demos['demperc18']<.5, 'WIN18'] = 'Republican'
part4_results_demos.loc[:, 'label'] = ''
part4_results_demos.loc[part4_results_demos['COUNTY'].isin(['Harris','Tarrant','Fort Bend',
'Montgomery', 'Williamson', 'Hays',
'Travis', 'Nueces', 'Jefferson']), 'label'] = part4_results_demos['COUNTY']
part4_results_demos.head()
# Create dictionaries of parameters to feed to plotting function
dict18 = {'df':part4_results_demos, 'results':'demperc18:Q', 'winningparty':'WIN18',
'xdomain':(25,60), 'ydomain':(0,1), 'width':400, 'title':' ',
'subtitle':'Share of Democratic votes for US House in 2018, Texas, by county. Point size indicates county population.'}
dict14 = {'df':part4_results_demos, 'results':'demperc14:Q', 'xdomain':(25,60), 'ydomain':(0,1),
'winningparty':'WIN14', 'width':400, 'title':'Democrats won more populous, younger, and more nonwhite counties in 2018',
'subtitle':'Share of Democratic votes for US House in 2014, Texas, by county. Point size indicates county population.'}
# Create plotting function
def demographic_scatter(in_dict, label=False):
'''
Create scatter plot where every point is a county, encoded by winning party (shape),
percent of Democratic vote (color fill), and plotted in a two-dimensional space
of percent-white and median age
'''
chart_scatter = alt.Chart(in_dict['df']).mark_point(clip=True, stroke='darkgray', strokeWidth=1).encode(
alt.X('MEDIAN_AGE:Q', scale=alt.Scale(domain=in_dict['xdomain']),
axis=alt.Axis(title='Median age')),
alt.Y('PERC_WHITE:Q', scale=alt.Scale(domain=in_dict['ydomain']),
axis=alt.Axis(format='%', title='Percent white')),
size=alt.Size('TOTAL_POP:Q',
scale=alt.Scale(domain=[0,4e6], range=[30, 1800]),
legend=None),
#alt.Legend(title='Total population', orient='bottom',
# padding=5, fillColor='white', strokeColor='black',
# values=[100000, 500000, 1000000, 3000000],
# titleLineHeight=12, titleLimit=100)),
fill=alt.Fill(in_dict['results'],
scale=alt.Scale(domain=[0, .1, .2, .3, .4, .5, .6, .7, .8, .9, 1],
range=["#a80000", "#c21b18", "#d72f30",
"#d75d5d", "#e27f7f", "#a5b0ff",
"#799be2", "#6674d3", "#584cde",
"#0f38bf"]),
legend=alt.Legend(title=["Democratic share", 'of vote won'],
orient='right', padding=5, fillColor='white',
strokeColor='black', format='%', direction='horizontal',
values=[0, 0.5, 1],
titleLineHeight=12, titleLimit=100)),
shape=alt.Shape(in_dict['winningparty'],
scale=alt.Scale(range=['circle','triangle']),
legend=alt.Legend(title='Winning party', orient='right',
padding=5, fillColor='white', strokeColor='black',
titleLineHeight=12))
).properties(
width=in_dict['width'],
height=350,
title = {
'text':in_dict['title'],
'subtitle':in_dict['subtitle'],
'subtitleFont':'Calibri'
}
)
if label:
chart_label = alt.Chart(in_dict['df']).mark_point(stroke='darkgray', strokeWidth=1).encode(
alt.X('MEDIAN_AGE:Q', scale=alt.Scale(domain=[25,60]),
axis=alt.Axis(title='Median age')),
alt.Y('PERC_WHITE:Q', axis=alt.Axis(format='%', title='Percent white')),
text='label'
).mark_text(dx=0, dy=12)
return chart_scatter + chart_label
else:
return chart_scatter
# Plot 2018 and 2014 results
scatter18 = demographic_scatter(dict18)
scatter14 = demographic_scatter(dict14)
alt.hconcat(scatter14, scatter18)
Sources: Census Bureau 5-year ACS estimates (2018) and Texas Secretary of State election results
We can also zoom in on a particular area of interest on the chart, which includes many of Texas' largest counties. Namely, we want to zoom in on the cities where the median age is between 30 and 40 and the percentage of the population that is white is between 20% and 60%.
# Create dictionaries of parameters to feed to plotting function
p4_results_zoom = part4_results_demos[(part4_results_demos['PERC_WHITE'] >= .2) &
(part4_results_demos['PERC_WHITE'] <= .6) &
(part4_results_demos['MEDIAN_AGE'] >= 30) &
(part4_results_demos['MEDIAN_AGE'] <= 40)]
zoom14 = alt.Chart(p4_results_zoom).mark_point(clip=True, stroke='darkgray', strokeWidth=1).encode(
alt.X('MEDIAN_AGE:Q', scale=alt.Scale(domain=(30,40)),
axis=alt.Axis(title='Median age')),
alt.Y('PERC_WHITE:Q', scale=alt.Scale(domain=(.2,.65)),
axis=alt.Axis(format='%', title='Percent white')),
size=alt.Size('TOTAL_POP:Q',
scale=alt.Scale(domain=[0,4e6], range=[30, 1800]),
legend=None),
fill=alt.Fill('demperc14:Q',
scale=alt.Scale(domain=[0, .1, .2, .3, .4, .5, .6, .7, .8, .9, 1],
range=["#a80000", "#c21b18", "#d72f30",
"#d75d5d", "#e27f7f", "#a5b0ff",
"#799be2", "#6674d3", "#584cde",
"#0f38bf"]),
legend=alt.Legend(title=["Democratic share", 'of vote won'],
orient='right', padding=5, fillColor='white',
strokeColor='black', format='%', direction='horizontal',
values=[0, 0.5, 1],
titleLineHeight=12, titleLimit=100)),
shape=alt.Shape('WIN14',
scale=alt.Scale(range=['circle','triangle']),
legend=alt.Legend(title='Winning party', orient='right',
padding=5, fillColor='white', strokeColor='black',
titleLineHeight=12))
).properties(
width=450,
height=450,
title = {
'text':'Harris and Fort Bend were the two largest counties to flip to the Democrats in 2018',
'subtitle':'Share of Democratic votes for US House in 2018, Texas, by county. Point size indicates county population.',
'subtitleFont':'Calibri'
}
)
label14 = alt.Chart(p4_results_zoom).mark_point(stroke='darkgray', strokeWidth=1).encode(
alt.X('MEDIAN_AGE:Q', scale=alt.Scale(domain=(30,40)),
axis=alt.Axis(title='Median age')),
alt.Y('PERC_WHITE:Q', axis=alt.Axis(format='%', title='Percent white')),
text='label'
).mark_text(dx=0, dy=12)
zoom18 = alt.Chart(p4_results_zoom).mark_point(clip=True, stroke='darkgray', strokeWidth=1).encode(
alt.X('MEDIAN_AGE:Q', scale=alt.Scale(domain=(30,40)),
axis=alt.Axis(title='Median age')),
alt.Y('PERC_WHITE:Q', scale=alt.Scale(domain=(.2,.65)),
axis=alt.Axis(format='%', title='Percent white')),
size=alt.Size('TOTAL_POP:Q',
scale=alt.Scale(domain=[0,4e6], range=[30, 1800]),
legend=None),
fill=alt.Fill('demperc18:Q',
scale=alt.Scale(domain=[0, .1, .2, .3, .4, .5, .6, .7, .8, .9, 1],
range=["#a80000", "#c21b18", "#d72f30",
"#d75d5d", "#e27f7f", "#a5b0ff",
"#799be2", "#6674d3", "#584cde",
"#0f38bf"]),
legend=alt.Legend(title=["Democratic share", 'of vote won'],
orient='right', padding=5, fillColor='white',
strokeColor='black', format='%', direction='horizontal',
values=[0, 0.5, 1],
titleLineHeight=12, titleLimit=100)),
shape=alt.Shape('WIN18',
scale=alt.Scale(range=['circle','triangle']),
legend=alt.Legend(title='Winning party', orient='right',
padding=5, fillColor='white', strokeColor='black',
titleLineHeight=12))
).properties(
width=450,
height=450,
title = {
'text':' ',
'subtitle':'Share of Democratic votes for US House in 2018, Texas, by county. Point size indicates county population.',
'subtitleFont':'Calibri'
}
)
label18 = alt.Chart(p4_results_zoom).mark_point(stroke='darkgray', strokeWidth=1).encode(
alt.X('MEDIAN_AGE:Q', scale=alt.Scale(domain=(30,40)),
axis=alt.Axis(title='Median age')),
alt.Y('PERC_WHITE:Q', axis=alt.Axis(format='%', title='Percent white')),
text='label'
).mark_text(dx=0, dy=12)
alt.hconcat(zoom14+label14, zoom18+label18)
Sources: Census Bureau 5-year ACS estimates (2018) and Texas Secretary of State election results
Not only do we clearly observe that Democrats performed best in young, populous and relatively nonwhite counties (and the reverse is true for Republicans), we see clear indicators of Democrats' improvement in such areas from 2014 to 2018. For instance, the largest purple triangle representing Harris County (Houston) in 2014, when Democrats won 45% of the county-wide vote, flips to a blue circle in 2018, when Democrats took 55% of the county vote. Similarly, Tarrant County (Fort Worth) is the large red triangle at median age of 34 and 50% white - it clearly demonstrates a change in hue from 2014, when Democrats won just 38% of the county-wide vote, to 2018, when Democrats picked up 45%.
It's important to note that the demographic estimates are constant/unchanging across the two periods because they come from the ACS 5-year estimates. Furthermore, three large counties (Harris, Dallas, and Bexar) are plotted in an overlapping fashion because they have very similar demographics.
The labeled counties are called out because they are particularly large or impactful counties that experienced significant swings towards the Democrats between 2014 and 2018. Harris and Fort Bend Counties, for example, went for Republicans in 2014 and flipped to the Democrats in 2018. Tarrant, Hays, and Jefferson Counties are prime examples of large, majority-white suburban counties that didn't entirely flip Democrats way in 2018 but moved significantly to the left.
In this section, I'll explore total receipts by congressional candidates in Texas, for campaign cycles 1988-2018. This data is drawn from the FEC (Federal Election Commission - specifically, the "All Candidates", "Candidate master", and "Contributions by individuals" datasets from 1988-present), which collects and publishes data on campaign finance for candidates for federal office. I'm interested in whether the total receipts by candidates of each of the two major parties have followed similar trends over the past 30 years, both in terms of amounts and the proportion of receipts that are sourced from individual donors (rather than PACs).
I performed some pre-processing on the raw FEC data prior to loading it in this notebook, including filtering to include only candidates for the US House of Representatives in Texas, aggregating receipts and contributions from individuals across all candidates of the two major parties by year, and merging with Bureau of Labor Statistics (BLS) CPI data to convert nominal dollars to real dollars using 2018 as the baseline index year.
# Start with some basic data importation and cleaning
fec_tx = pd.read_csv('TX_house_88_18.csv')
# Convert year column to date variable
fec_tx['YEAR_DT'] = pd.to_datetime(fec_tx['YEAR'], format="%Y")
# Convert total receipts from dollars to millions of dollars
fec_tx['ADJ_TTL_RECEIPTS_MIL'] = fec_tx['ADJ_TTL_RECEIPTS'] / 1000000
# Recode party variable for more aesthetically pleasing legend display
fec_tx.loc[fec_tx['CAND_PTY_AFFILIATION']=='DEM', 'PARTY'] = 'Democratic'
fec_tx.loc[fec_tx['CAND_PTY_AFFILIATION']=='REP', 'PARTY'] = 'Republican'
# Create separate file for all years up to 2016
fec_tx_16 = fec_tx[fec_tx['YEAR']<2018]
fec_tx.head()
line18 = alt.Chart(fec_tx).mark_line().encode(
x=alt.X('YEAR_DT', axis=alt.Axis(title='Election cycle', labelFontSize=12)),
y=alt.Y('ADJ_TTL_RECEIPTS_MIL',
axis=alt.Axis(title=['Adjusted total receipts,',
'millions of dollars'])),
color=alt.Color('PARTY',
scale=alt.Scale(domain=['Democratic', 'Republican'],
range=['blue', 'red']),
legend=alt.Legend(title='Party', orient='right',
padding=5, fillColor='white',
strokeColor='black', titleAlign='left'))
).properties(
title={
'text': 'Democrats set records in Texas congressional campaign fundraising in the 2018 midterms',
'subtitle': 'Inflation-adjusted receipts by election cycle, party, and individual giving proportion (real 2018 USD)',
'anchor': 'start',
'subtitleFont':'Calibri'
},
width=750,
height=300
)
fec_tx.loc[:, 'ADJ_TTL_RECEIPTS_IND'] = fec_tx['ADJ_TTL_RECEIPTS_MIL']*fec_tx['PERC_INDIV_CONTRIB']
fec_tx.loc[:, 'ADJ_TTL_RECEIPTS_NON'] = fec_tx['ADJ_TTL_RECEIPTS_MIL'] - fec_tx['ADJ_TTL_RECEIPTS_IND']
fec_tx_ind = fec_tx[['YEAR', 'PARTY', 'ADJ_TTL_RECEIPTS_IND']]
fec_tx_ind.rename(columns={'ADJ_TTL_RECEIPTS_IND':'ADJ_TTL_RECEIPTS'}, inplace=True)
fec_tx_ind.loc[:, 'TYPE'] = 'Individual'
fec_tx_non = fec_tx[['YEAR', 'PARTY', 'ADJ_TTL_RECEIPTS_NON']]
fec_tx_non.rename(columns={'ADJ_TTL_RECEIPTS_NON':'ADJ_TTL_RECEIPTS'}, inplace=True)
fec_tx_non.loc[:, 'TYPE'] = 'Other'
fec_tx_out = pd.concat([fec_tx_ind, fec_tx_non])
demchart = fec_tx_out[fec_tx_out['PARTY']=='Democratic']
yearly_demtype = alt.Chart(demchart).mark_bar().encode(
x=alt.X('TYPE:N', axis=alt.Axis(labels=False, title=None)),
y=alt.Y('ADJ_TTL_RECEIPTS:Q', axis=alt.Axis(title=None)),
color=alt.Color('TYPE:N',
scale=alt.Scale(domain=['Individual', 'Other'],
range=["#0f38bf", "#a5b0ff"]),
legend=alt.Legend(title=['Contribution','type'], orient='right',
padding=5, fillColor='white', strokeColor='black',
titleAlign='left')),
column=alt.Column('YEAR:O', title=None)
).properties(height=150, width=30)
repchart = fec_tx_out[fec_tx_out['PARTY']=='Republican']
yearly_reptype = alt.Chart(repchart).mark_bar().encode(
x=alt.X('TYPE:N', axis=alt.Axis(labels=False, title=None)),
y=alt.Y('ADJ_TTL_RECEIPTS:Q', scale=alt.Scale(domain=[0, 130]),
axis=alt.Axis(title=['Adjusted total receipts,', 'millions of dollars'])),
color=alt.Color('TYPE:N',
scale=alt.Scale(domain=['Individual', 'Other'],
range=["#a80000", "#e27f7f"]),
legend=alt.Legend(title=['Contribution','type'], orient='right',
padding=5, fillColor='white', strokeColor='black',
titleAlign='left')),
column='YEAR:O'
).properties(height=150, width=30)
line18
Source: Federal Election Commission (FEC) all-candidate receipt data
Democrats and Republicans largely kept pace with each other in congressional campaign fundraising for Texas races throughout the 1990s, but around 2000 the Republicans opened up a lead that had swelled to more than $20 million dollars (in 2018 terms) by 2016. However, massive enthusiasm on the Democratic side after Trump's election resulted in record-setting fundraising by Democratic congressional candidates in the 2018 midterms.
alt.vconcat(yearly_reptype, yearly_demtype).resolve_scale(color='independent').properties(
title={
'text':'Democrats lagged Republicans in fundraising for US House until massive grassroots enthusiasm broke records in 2018',
'subtitle':'Total campaign receipts by US House campaigns in Texas, constant 2018 dollars',
'subtitleFont':'Calibri'
}
)
Source: Federal Election Commission (FEC) all-candidate receipt data
This plot shows total campaign fundraising broken down into individual contributions and other (mostly PAC/committee) fundraising. Republicans outstripped Democrats in both types of fundraising throughout the 2000s and 2010s, but the column for 2018 shows that massive grassroots enthusiasm drove a significant Democratic fundraising advantage in that year's midterms. Notably, committee spending on Democratic congressional candidates was up from 2016, but by an amount completely dwarfed by the explosion in individual contributions. Furthermore, it's interesting to note that Republican congressional candidates in Texas have not received more in individual donations than in other (committee)-type contributions since the 2010 cycle, which was a high-enthusiasm and turnout midterm on the Republican side.
I'm curious how total campaign receipts differ across levels of federal office. This section will plot relative contribution amounts by office (House, Senate and the Presidency) for all of the USA, not just Texas.
officetype = pd.read_csv('year_party_office_receipts_all.csv')
officetype.head()
alt.Chart(officetype).mark_circle(
opacity=0.75,
stroke='black',
strokeWidth=1
).encode(
alt.X('YEAR:O', axis=alt.Axis(labelAngle=0, title='Year')),
alt.Y('CAND_OFFICE:N', axis=alt.Axis(title='Office'), scale=alt.Scale(domain=['President', 'House', 'Senate'])),
alt.Size('TTL_INDIV_CONTRIB_2018:Q',
legend=alt.Legend(title=['Individual contributions,', '2018 dollars'],
orient='right', padding=5, fillColor='white',
strokeColor='black', titleLineHeight=12,
values=[10000000,200000000,600000000]),
scale=alt.Scale(range=[0, 2000])),
alt.Color('CAND_OFFICE:N', legend=None,
scale=alt.Scale(range=['#d279a6', '#00ccff', '#ff9900'])),
alt.Column('CAND_PTY_AFFILIATION:N', title=' ')
).properties(
width=350,
height=260
).transform_filter(
alt.datum.CAND_OFFICE != 'Other/Unknown'
).properties(
title = {
'text':'Presidential campaigns receive the most contributions, but congressional campaigns are catching up',
'subtitle':'Total campaign receipts by year and federal office level, all states',
'anchor':'start',
'subtitleFont':'Calibri'
}
)
Source: Federal Election Commission (FEC) all-candidate receipt data
As we see, presidential campaigns outstripped congressional campaigns in terms of fundraising throughout the 2000s and into the 2010s, but congressional campaigns, on both the House and Senate level, have been catching up for the last few years. We see yet more evidence of Democrats' enthusiasm by the size of their contribution bubbles in the 2018 cycle.
Much ado was made in the media in the 2018 election about how much of Beto O'Rourke's Senate campaign funds were raised from out-of-state donors. While the 2018 FEC data was too large to load into Python (I'm working on it!) I decided to investigate the extent to which campaign contributions to Texan congressional candidates come from out of state, and which states contribute the most.
ids = pd.read_csv('state_ids.csv')
out_of_state = pd.read_csv('out_of_state_summary.csv')
out_of_state_16 = out_of_state[out_of_state['CYCLE']==2016]
out_of_state_16h = out_of_state_16[out_of_state_16['CAND_OFFICE']=='H']
usa_chloro = ids.merge(out_of_state_16h, how='left', on='STATE')
usa_chloro_dem = usa_chloro[usa_chloro['CAND_PTY_AFFILIATION']=='DEM']
usa_chloro_rep = usa_chloro[usa_chloro['CAND_PTY_AFFILIATION']=='REP']
states = alt.topo_feature(data.us_10m.url, 'states')
usa_chloro_dem.head()
dem16_donations = alt.Chart(states).mark_geoshape(stroke='white').project(
type='albersUsa'
).transform_lookup(
lookup='id',
from_=alt.LookupData(usa_chloro_dem, 'id', ['TOTAL_CONTRIB'])
).encode(
color=alt.Color('TOTAL_CONTRIB:Q',
legend=alt.Legend(title=['Individual out-of-', 'state contributions,', '2016 dollars'],
orient='right', padding=5, fillColor='white',
strokeColor='black', titleLineHeight=12))
).properties(
width=500,
height=300
)
outline = alt.Chart(states).mark_geoshape(stroke='black', fillOpacity=0).project(
type='albersUsa'
).properties(
width=500,
height=300
)
alt.layer(dem16_donations, outline).properties(
title = {
'text':'Out-of-state donations to Texan Democratic candidates for US House',
'subtitle':'2016 campaign cycle',
'anchor':'start',
'subtitleFont':'Calibri'
}
).configure_view(strokeOpacity=0)
Source: Federal Election Commission (FEC) all-candidate receipt data
Here we see that the most out-of-state contributions to Texan Democratic congressional candidates in the 2016 cycle came from Georgia (probably a few anomalous high-dollar donors). Outside of Georgia, California, New York and Florida were the highest-donating states to Texan candidates, which makes sense - all 3 of these states have sizable bases of affluent liberals. California and New York in particular reliably send Democratic delegations to Congress, so residents in those states may have felt that their dollars were better spent in competitive races in places like Texas.
rep16_donations = alt.Chart(states).mark_geoshape(stroke='white').project(
type='albersUsa'
).transform_lookup(
lookup='id',
from_=alt.LookupData(usa_chloro_rep, 'id', ['TOTAL_CONTRIB'])
).encode(
color=alt.Color('TOTAL_CONTRIB:Q',
legend=alt.Legend(title=['Individual out-of-', 'state contributions,', '2016 dollars'],
orient='right', padding=5, fillColor='white',
strokeColor='black', titleLineHeight=12))
).properties(
width=500,
height=300
)
outline = alt.Chart(states).mark_geoshape(stroke='black', fillOpacity=0).project(
type='albersUsa'
).properties(
width=500,
height=300
)
alt.layer(rep16_donations, outline).properties(
title = {
'text':'Out-of-state donations to Texan Republican candidates for US House',
'subtitle':'2016 campaign cycle',
'anchor':'start',
'subtitleFont':'Calibri'
}
).configure_view(strokeOpacity=0)
Source: Federal Election Commission (FEC) all-candidate receipt data
Intriguingly, the pattern of spending by out-of-staters on Texan congressional campaigns is nearly the same for Republicans as Democrats. Virginia has replaced Georgia as the highest-donating state, but California, New York and Florida remain in the mix. Perhaps Republicans in those states determined that their dollars were a lost cause in such strong liberal bastions, and they opted to send their donations to Texas instead.